Multi-Source Domain Adaptation: A Causal View
نویسندگان
چکیده
This paper is concerned with the problem of domain adaptation with multiple sources from a causal point of view. In particular, we use causal models to represent the relationship between the features X and class label Y , and consider possible situations where different modules of the causal model change with the domain. In each situation, we investigate what knowledge is appropriate to transfer and find the optimal target-domain hypothesis. This gives an intuitive interpretation of the assumptions underlying certain previous methods and motivates new ones. We finally focus on the case where Y is the cause for X with changing PY and PX|Y , that is, PY and PX|Y change independently across domains. Under appropriate assumptions, the availability of multiple source domains allows a natural way to reconstruct the conditional distribution on the target domain; we propose to model PX|Y (the process to generate effect X from cause Y ) on the target domain as a linear mixture of those on source domains, and estimate all involved parameters by matching the target-domain feature distribution. Experimental results on both synthetic and real-world data verify our theoretical results. Traditional machine learning relies on the assumption that both training and test data are from the same distribution. In practice, however, training and test data are probably sampled under different conditions, thus violating this assumption, and the problem of domain adaptation (DA) arises. Consider remote sensing image classification as an example. Suppose we already have several data sets on which the class labels are known; they are called source domains here. For a new data set, or a target domain, it is usually difficult to find the ground truth reference labels, and we aim to determine the labels by making use of the information from the source domains. Note that those domains are usually obtained in different areas and time periods, and that the corresponding data distribution various due to the change in illumination conditions, physical factors related to ground (e.g., different soil moisture or composition), vegetation, and atmospheric conditions. Other well-known instances of this situation include sentiment data analysis (Blitzer, Dredze, and Pereira 2007) and flow cytometry data analysis (Blanchard, Lee, and Scott 2011). DA approaches have Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. many applications in varies areas including natural language processing, computer vision, and biology. For surveys on DA, see, e.g., (Jiang 2008; Pan and Yang 2010; Candela et al. 2009). In this paper, we consider the situation with n source domains on which both the features X and label Y are given, i.e., we are given (x,y) = (x k , y (i) k ) mi k=1, where i = 1, ..., n, and mi is the sample size of the ith source domain. Our goal is to find the classifier for the target domain, on which only the features x = (xk) m k=1 are available. Here we are concerned with a difficult scenario where no labeled point is available in the target domain, known as unsupervised domain adaptation. Since PXY changes across domains, we have to find what knowledge in the source domains should be transferred to the target one. Previous work in domain adaptation has usually assumed that PX changes but PY |X remain the same, i.e., the covariate shift situation; see, e.g., (Shimodaira 2000; Huang et al. 2007; Sugiyama et al. 2008; Ben-David, Shalev-Shwartz, and Urner 2012). It is also known as sample selection bias (particularly on the features X) in (Zadrozny 2004). In practice it is very often that both PX and PY |X change simultaneously across domains. For instance, both of them are likely to change over time and location for a satellite image classification system. If the data distribution changes arbitrarily across domains, clearly knowledge from the sources may not help in predicting Y on the target domain (Rosenstein et al. 2005). One has to find what type of information should be transferred from sources to the target. One possibility is to assume the change in both PX and PY |X is due to the change in PY , while PX|Y remains the same, as known as prior probability shift (Storkey 2009; Plessis and Sugiyama 2012) or target shift (Zhang et al. 2013). The latter further models the change in PX|Y caused by a location-scale (LS) transformation of the features for each class. The constraint of the LS transformation renders PX|Y on the target domain, denoted by P t X|Y , identifiable; however, it might be too restrictive. Fortunately, the availability of multiple source domains provides more hints as to find P t X|Y , as well as P t Y |X . Several algorithms have been proposed to combine knowledge from multiple source domains. For instance, (Mansour, Mohri, and Rostamizadeh 2008) proposed to form the target hypothesis by combining source hypotheses with a distribution weighted rule. (Gao et al. 2008), (Duan et al. 2009), and (Chattopadhyay et al. 2011) combine the predictions made by the source hypotheses, with the weights determined in different ways. An intuitive interpretation of the assumptions underlying those algorithms would facilitate choosing or developing DA methods for the problem at hand. To the best of our knowledge, however, it is still missing in the literature. One of our contributions in this paper is to provide such an interpretation. This paper studies the multi-source DA problem from a causal point of view where we consider the underlying data generating process behind the observed domains. We are particularly interested in what types of information stay the same, what types of information change, and how they change across domains. This enables us to construct the optimal hypothesis for the target domain in various situations. To this end, we use causal models to represent the relationship between X and Y , because they provide a compact description of the properties of the change in the data distribution.1 They, for instance, help characterize transportability of experimental findings (Pearl and Bareinboim 2011) or recoverability from selection bias (Bareinboim, Tian, and Pearl 2014). As another contribution, we further focus on a typical DA scenario where both PY and PX|Y (or the causal mechanism to generate effect X from cause Y ) change across domains, but their changes are independent from each other, as implied by the causal model Y → X . We assume that the source domains contains rich information such that for each class, P t X|Y can be approximated by a linear mixture of PX|Y on source domains. Together with other mild conditions on PX|Y , we then show that P t X|Y , as well as P t Y , is identifiable (or can be uniquely recovered). We present a computationally efficient method to estimate the involved parameters based on kernel mean distribution embedding (Smola et al. 2007; Gretton et al. 2007), followed by several approaches to constructing the target classifier using those parameters. One might wonder how to find the causal information underlying the data to facilitate domain adaptation. We note that in practice, background causal knowledge is usually available, helping formulating how to transfer the knowledge from source domains to the target. Even if this is not the case, multiple source domains with different data distributions may allow one to identify the causal structure, since the causal knowledge can be seen from the change in data distributions; see e.g., (Tian and Pearl 2001). 1 Possible DA Situations and Their Solutions DA can be considered as a learning problem in nonstationary environments (Sugiyama and Kawanabe 2012). It is helpful to find how the data distribution changes; it provides the clues as to find the learning machine for the target domain. The causal model also describes how the components of the joint distribution are related to each other, which, for instance, gives a causal explanation of the behavior of semi-supervised learning (Schölkopf et al. 2012). Table 1: Notation used in this paper. X , Y random variables X , Y domains
منابع مشابه
Information-Theoretic Multi-view Domain Adaptation: A Theoretical and Empirical Study
Multi-view learning aims to improve classification performance by leveraging the consistency among different views of data. The incorporation of multiple views was paid little attention in the studies of domain adaptation, where the view consistency based on source data is largely violated in the target domain due to the distribution gap between different domain data. In this paper, we leverage...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملSample-oriented Domain Adaptation for Image Classification
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...
متن کاملInformation-theoretic Multi-view Domain Adaptation
We use multiple views for cross-domain document classification. The main idea is to strengthen the views’ consistency for target data with source training data by identifying the correlations of domain-specific features from different domains. We present an Information-theoretic Multi-view Adaptation Model (IMAM) based on a multi-way clustering scheme, where word and link clusters can draw toge...
متن کاملOnline Active Learning for Cost Sensitive Domain Adaptation
Active learning and domain adaptation are both important tools for reducing labeling effort to learn a good supervised model in a target domain. In this paper, we investigate the problem of online active learning within a new active domain adaptation setting: there are insufficient labeled data in both source and target domains, but it is cheaper to query labels in the source domain than in the...
متن کامل